智能论文笔记

SD-GAN: Semantic Decomposition for Face Image Synthesis with Discrete Attribute

Zhou Kangneng , Zhu Xiaobin , Gao Daiheng , Lee Kai , Li Xinjie , Yin Xu-Cheng

分类：计算机视觉

2022-07-12

在生成对抗网络（GAN）中操纵潜在代码的面部图像合成主要集中于连续属性合成（例如，年龄，姿势和情感），而离散属性合成（例如面膜和眼镜）受到较少的注意。直接将现有作品应用于面部离散属性可能会导致结果不正确。在这项工作中，我们提出了一个创新的框架，以通过语义分解，称为SD-GAN来解决具有挑战性的面部离散属性合成。要具体，我们将离散属性表示形式明确分解为两个组件，即语义先验和偏移潜在表示。语义先验基础显示了在潜在空间中操纵面部表示的初始化方向。提出了通过3D感知语义融合网络获得的偏移潜在呈现，以调整先前的基础。此外，融合网络集成了3D嵌入，以更好地身份保存和离散属性合成。先前基础和抵消潜在表示的组合使我们的方法能够合成具有离散属性的照片真实面部图像。值得注意的是，我们构建了一个大型且有价值的数据集MEGN（从Google和Naver捕获的面膜和眼镜图像），以完成现有数据集中缺乏离散属性。广泛的定性和定量实验证明了我们方法的最新性能。我们的代码可在以下网址找到：https：//github.com/montaellis/sd-gan。

translated by 谷歌翻译

ULTRA: Uncertainty-aware Label Distribution Learning for Breast Tumor Cellularity Assessment

Xiangyu Li , Xinjie Liang , Gongning Luo , Wei Wang , Kuanquan Wang , Shuo Li

分类：计算机视觉

2022-06-14

乳腺癌的新辅助治疗（NAT）是临床实践中的常见治疗选择。肿瘤细胞（TC）代表肿瘤床中浸润性肿瘤的百分比，已被广泛用于量化乳腺癌对NAT的反应。因此，自动TC估计在临床实践中很重要。但是，现有的最新方法通常将其视为TC分数回归问题，它忽略了由主观评估或多个评估者引起的TC标签的歧义。在本文中，为了有效利用标签歧义，我们提出了一个不确定性的标签分布学习（ULTRA）框架以进行自动TC估计。拟议的Ultra首先将单值TC标签转换为离散标签分布，这有效地模拟了所有可能的TC标签之间的歧义。此外，该网络通过最大程度地减少预测和地面TC标签分布之间的Kullback-Leibler（KL）差异来学习TC标签分布，从而更好地监督该模型以利用TC标签的歧义。此外，在临床实践中，具有多分支特征融合模块的超级评估者融合过程，以进一步探索TC标签的不确定性。我们评估了公共Bresspathq数据集上的Ultra。实验结果表明，超大的幅度优于基于回归的方法，并获得了最先进的结果。该代码将从https://github.com/perceptioncomputinglab/ultra获得

translated by 谷歌翻译

Semantic Bilinear Pooling for Fine-Grained Recognition

Xinjie Li , Chun Yang , Songlu Chen , Chao Zhu , Xu-Cheng Yin

分类：计算机视觉

2019-04-03

当然，细粒度的识别，例如车辆识别或鸟类分类，具有特定的分层标签，其中精细类别总是难以分类而不是粗作品。然而，最近的大多数基于深度学习的方法都忽略了细粒物体的语义结构，并且不利用传统的细粒度识别技术（例如，粗致细的分类）。在本文中，我们提出了一种具有双分支网络（粗分支和细枝）的新颖框架，即语义双线性汇集，用于使用分级标签树进行细粒度识别。该框架可以自适应地从层级中学习语义信息。具体而言，我们设计了通过考虑相邻水平与不同粗级别的样本之间的距离来完全利用语义前导者来充分利用语义前导者的训练的广义交叉熵损失。此外，我们的方法在测试时仅利用细分分支，以便在测试时间内增加开销。实验结果表明，我们的提出方法在四个公共数据集上实现了最先进的性能。

translated by 谷歌翻译

Learning to Play Trajectory Games Against Opponents with Unknown Objectives

Xinjie Liu , Lasse Peters , Javier Alonso-Mora

分类：机器人

2022-11-24

Many autonomous agents, such as intelligent vehicles, are inherently required to interact with one another. Game theory provides a natural mathematical tool for robot motion planning in such interactive settings. However, tractable algorithms for such problems usually rely on a strong assumption, namely that the objectives of all players in the scene are known. To make such tools applicable for ego-centric planning with only local information, we propose an adaptive model-predictive game solver, which jointly infers other players' objectives online and computes a corresponding generalized Nash equilibrium (GNE) strategy. The adaptivity of our approach is enabled by a differentiable trajectory game solver whose gradient signal is used for maximum likelihood estimation (MLE) of opponents' objectives. This differentiability of our pipeline facilitates direct integration with other differentiable elements, such as neural networks (NNs). Furthermore, in contrast to existing solvers for cost inference in games, our method handles not only partial state observations but also general inequality constraints. In two simulated traffic scenarios, we find superior performance of our approach over both existing game-theoretic methods and non-game-theoretic model-predictive control (MPC) approaches. We also demonstrate our approach's real-time planning capabilities and robustness in two hardware experiments.

translated by 谷歌翻译

Latent Heterogeneous Graph Network for Incomplete Multi-View Learning

Pengfei Zhu , Xinjie Yao , Yu Wang , Meng Cao , Binyuan Hui , Shuai Zhao , Qinghua Hu

分类：机器学习 | 计算机视觉

2022-08-29

近年来，多视图学习迅速发展。尽管许多先前的研究都认为每个实例都出现在所有视图中，但在现实世界应用程序中很常见，从某些视图中丢失实例，从而导致多视图数据不完整。为了解决这个问题，我们提出了一个新型潜在的异质图网络（LHGN），以实现不完整的多视图学习，该学习旨在以灵活的方式尽可能充分地使用多个不完整的视图。通过学习统一的潜在代表，隐含地实现了不同观点之间一致性和互补性之间的权衡。为了探索样本与潜在表示之间的复杂关系，首次提出了邻域约束和视图约束，以构建异质图。最后，为了避免训练和测试阶段之间的任何不一致之处，基于图形学习的分类任务应用了转导学习技术。对现实世界数据集的广泛实验结果证明了我们模型对现有最新方法的有效性。

translated by 谷歌翻译

RCA: Ride Comfort-Aware Visual Navigation via Self-Supervised Learning

Xinjie Yao , Ji Zhang , Jean Oh

分类：机器人 | 人工智能

2022-07-29

在共同的自主权下，轮椅用户期望车辆在遵循用户高级导航计划的同时提供安全舒适的游乐设施。为了找到这样的道路，车辆与不同的地形进行谈判，并评估其遍历难度。大多数先前的作品通过几何表示或语义分类进行了模型，这并不能反映在下游导航任务中感知的运动强度和骑行舒适性。我们建议使用本体感知感应在遍历性分析中明确对骑行舒适度进行建模。我们开发了一个自我监督的学习框架，以通过利用车辆状态作为训练信号来预测第一人称视图图像的遍历性成本量。我们的方法估计，如果根据地形外观进行遍历，车辆的感觉会如何。然后，我们显示我们的导航系统通过机器人实验以及人类评估研究提供了人类偏爱的骑行舒适性。

translated by 谷歌翻译

Safe Model Predictive Control Approach for Non-holonomic Mobile Robots

Xinjie Liu , Vassil Atanassov

分类：机器人

2022-07-26

我们为非全面移动机器人设计了MPC方法，并在分析上表明，随着时间的变化，线性化的系统可以在跟踪任务中的来源周围产生渐近稳定性。为了避免障碍物，我们提出了速度空间中的约束，该约束根据当前状态明确耦合两个控件输入。

translated by 谷歌翻译

Omni-swarm: A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swarms

Hao Xu , Yichen Zhang , Boyu Zhou , Luqi Wang , Xinjie Yao , Guotao Meng , Shaojie Shen

分类：机器人

2021-03-06

去中心化的国家估计是GPS贬低的地区自动空中群体系统中最基本的组成部分之一，但它仍然是一个极具挑战性的研究主题。本文提出了Omni-swarm，一种分散的全向视觉惯性-UWB状态估计系统，用于解决这一研究利基市场。为了解决可观察性，复杂的初始化，准确性不足和缺乏全球一致性的问题，我们在Omni-warm中引入了全向感知前端。它由立体宽型摄像机和超宽带传感器，视觉惯性探测器，基于多无人机地图的本地化以及视觉无人机跟踪算法组成。前端的测量值与后端的基于图的优化融合在一起。所提出的方法可实现厘米级的相对状态估计精度，同时确保空中群中的全球一致性，这是实验结果证明的。此外，在没有任何外部设备的情况下，可以在全面的无人机间碰撞方面支持，表明全旋转的潜力是自动空中群的基础。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译